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ABSTRACT 

To find interesting structure in networks, community detec- 
tion algorithms have to take into account not only the net- 
work topology, but also dynamics of interactions between 
nodes. We investigate this claim using the paradigm of syn- 
chronization in a network of coupled oscillators. As the net- 
work evolves to a global steady state, nodes belonging to the 
same community synchronize faster than nodes belonging to 
different communities. Traditionally, nodes in network syn- 
chronization models are coupled via one-to-one, or conserva- 
tive interactions. However, social interactions are often one- 
to-many, as for example, in social media, where users broad- 
cast messages to all their followers. We formulate a novel 
model of synchronization in a network of coupled oscillators 
in which the oscillators are coupled via one-to-many, or non- 
conservative interactions. We study the dynamics of different 
interaction models and contrast their spectral properties. To 
find multi-scale community structure in a network of inter- 
acting nodes, we define a similarity function that measures 
the degree to which nodes are synchronized and use it to 
hierarchically cluster nodes. We study real-world social net- 
works, including networks of two social media providers. To 
evaluate the quality of the discovered communities in a so- 
cial media network we propose a community quality metric 
based on user activity. We find that conservative and non- 
conservative interaction models lead to dramatically differ- 
ent views of community structure even within the same net- 
work. Our work offers a novel mathematical framework for 
exploring the relationship between network structure, topol- 
ogy and dynamics. 

INTRODUCTION 

Modular structure is an important characteristic of com- 
plex real- world networks, including social networks which 
are composed of communities and sub-communities of inter- 
connected individuals, and biological networks, which are 
often organized within functional modules [l4 15 . Conduc- 
tance minimization [5] and modularity maximization [5] are 
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some of the most popular methods for community detection. 
However, these are combinatorial approaches that have been 
shown to be NP-hard or NP-complete. As the result, re- 
searchers resort to heuristics and approximation algorithms 
when applying these methods to community detection prob- 
lem. On the other hand, decentralized algorithms based on 
local computation have been shown to provide scalable so- 
lutions to combinatorial problems [23]. Motivated by this 
idea, we cast community detection as a decentralized com- 
putation problem in which a network of locally interacting 
agents over time finds a global solution that corresponds to 
the community division of the network. 

Network's modular structure is a product of both the 
topology of its underlying connections and also its function, 
which is determined by the dynamic processes taking place 
on the network. Nodes are not static but change their state 
or activity levels in response to the actions of neighbors. Ex- 
isting community detection algorithms focus solely on net- 
work topology and ignore dynamic processes taking place on 
the network, or they implicitly assume that interactions be- 
tween nodes are mediated by a conservative process similar 
to heat diffusion [3j [9] . However, this assumption may not 
be justified for social networks [6]. 

In this paper we study a framework for multi-scale analy- 
sis of community structure in networks that explicitly takes 
interactions into account. We consider a static network of 
active nodes (or agents), who can affect the state or activ- 
ity of their neighbors through interactions. We differentiate 
between one-to-one interactions, henceforth referred to as 
conservative, and one-to-many, or non-conservative interac- 
tions. Examples of the former include money exchange, Web 
surfing, and diffusion in physical systems. Non- conservative 
interactions include broadcast-based interactions that lead 
to information diffusion, epidemics, and other social phe- 
nomena. These local interactions cause nodes' activity to 
become more similar. In a social network, for example, fre- 
quent contact leads to similarity of behavior among friends. 
Over time, communities composed of individuals who act 
in a similar manner will emerge. As another example, con- 
sider a population of fireflies who have characteristic light 
flashing patterns to help males and females recognize each 
other. Some firefly species exhibit synchronous flashing, dur- 
ing which individual's flashing pattern can affect that of his 
neighbors, leading all nearby fireflies to flash in unison [18] . 
The Kuramoto model is a simple mathematical description 
of distributed synchronization in this and other physical and 
biological systems 10 . The model considers a network of 
coupled oscillators, in which the phase of each oscillator is 



affected by the phases of its neighbors. While the network 
as a whole eventually reaches a fully synchronized state, it 
does so in stages, with nodes belonging to the same commu- 
nity synchronizing faster than nodes belonging to different 
communities [2]. 

We propose a new model of distributed synchronization 
based on non- conservative interactions. We show that in this 
interaction model, nodes synchronize much faster than in the 
conservative interaction model. We use dynamic interaction 
models to explore community structure of several networks, 
including a benchmark social network and large real-world 
networks from social media sites. We investigate how the 
dynamics of synchronization, and the community structure 
that emerges from it, are affected by the nature of interac- 
tions. Our study reveals substantial differences in network 
structure discovered by different interaction models. We find 
a complex layered organization of the real- world networks. 
While these networks exhibits the 'core and whiskers' or- 
ganization found in other real-world social and information 
networks [II] , with a giant core and multiple small commu- 
nities (whiskers) weakly connected to the core, this consti- 
tutes but one layer of the organization. As we peel away the 
whiskers layer to examine the core, we find a similar 'core 
and whiskers' structure in the new layer, and so on. We eval- 
uate community division by measuring similarity of commu- 
nity members, introducing an activity-based measure of sim- 
ilarity for social media users. We show that non-conservative 
interaction model finds many more valid communities. 

The principal contribution of this paper is a formal frame- 
work that consolidates and generalizes approaches to com- 
munity detection in complex networks. We apply this frame- 
work to study real- world networks. The specific contribu- 
tions that support this perspective are 

• A novel model for locally interacting nodes coupled via 
non-conservative interactions 

• A methodology for hierarchical community detection 
based on synchronization similarity and an activity- 
based measure of community quality 

• Detailed investigation of interaction models on real- 
world networks revealing important differences between 
them within a layered 'onion-like' organization of com- 
plex networks 

While it may seem counter-intuitive that a network's com- 
munity structure depends on anything but its topology, as 
we show in this paper, different dynamic processes running 
on the same topology can lead to different views of network 
structure. In reality network structure, its topology and dy- 
namics are intricately interconnected and our work offers a 
formal framework to begin exploring these connections. 

NETWORK INTERACTION MODELS 

We consider a network of active nodes (e.g., agents or 
actors), each interacting locally with its neighbors. Inter- 
actions between nodes determine the dynamic process tak- 
ing place on the network. Consider financial exchange net- 
works in which nodes distribute money among their network 
neighbors. The interactions that give rise to the financial ex- 
change can be called conservative, since they do not increase 
nor decrease the amount of money exchanged. Web surfing, 
communicating via phone calls, and other one-to-one inter- 
actions are conservative, because, as in the case of a Web 



surfer, at any time the surfer can browse only one page, 
and the probability to find the surfer on any Web page re- 
mains constant. We contrast these to non- conservative in- 
teractions, which do not preserve the amount of quantity 
exchanged. Take, as an example, a virus spreading through 
a social network. A person (node) will get infected with a 
virus through her infected friends, but the amount of the 
virus present in the network will increase because of these 
interactions (or decrease as infected people become cured). 
Social processes based on one-to-many interactions, such 
as users broadcasting messages in online social media, are 
also non- conservative in nature. While the conservative /non- 
conservative dichotomy might not capture the full range of 
possible interactions in a network, we begin our investigation 
here because this dichotomy can be described mathemat- 
ically. Moreover, to keep mathematics tractable, we focus 
analysis on linear interactions. 

Physicists have studied the dynamics of interacting enti- 
ties in an attempt to understand collective behavior of com- 
plex networks. The Kuramoto model [To] was proposed as 
a simple model for how global synchronization may arise 
in physical and biological systems. The model considers a 
network of phase oscillators, each coupled to its neighbors 
through the sine of their phase differences. The Kuramoto 
model has a fully synchronized steady state in which the 
phase difference between all oscillators is zero. 

As we show below, the Kuramoto model (at least in the 
linear case) assumes that interactions between nodes are me- 
diated by a conservative process similar to heat diffusion, 
which is mathematically related to the random walk. How- 
ever, not all social phenomena, including epidemic spread 
and information diffusion, admit to such descriptions pi. In 
this section we introduce a new model of distributed syn- 
chronization based on non-conservative interactions. 

Conservative Interaction Models 

The Kuramoto model is written as: 

= uji + ^ Kijsin(0j - Oi) (1) 

j £neigh(i) 

where Oi is the instantaneous phase of the zth oscillator, uji 
is its natural frequency, and Kij is the coupling constant 
that describes the strength of interaction with jth neigh- 
bor. The neighborhood of node i, neigh(i), contains nodes 
which share an edge with node i. For small phase differences, 
sinO ~ and the linearized version of the Kuramoto model 
can be written as: 

^r = <* + E KaiBi-Bi) (2) 

j £neigh(i) 

In a more general sense, we treat Oi as some extrinsic prop- 
erty of node i (agent), which is dynamic and can be affected 
by the local interaction with the neighbors. The quantity uji 
can then be perceived as its intrinsic property, which is not 
affected by external factors and remains constant over time. 
For example, Oi could represent the opinions of an individ- 
ual agent i, and uji his intrinsic beliefs. Though his opinions 
depend on his intrinsic beliefs, they may change over time 
as the result of interactions with neighbors. Though rather 
simplified, we believe that this abstract model provides a 
useful framework to study social phenomena. 



For convenience, we rewrite Eq.|2]in vector form: 



dt 



K-L6 



(3) 



Here uj is the vector of length N of intrinsic properties of 
nodes, is a vector of their extrinsic properties, and K is a 
matrix of pairwise couplings constants between nodes. K • L 
is the dot product of K and L. Operator L is the Laplacian 
of the graph L = D — A. Here A is the adjacency matrix 
of the unweighted, undirected graph, such that A^,^] = 1 if 
there exists an edge between i and j; otherwise, A[i, j] = 0. 
Matrix D is the diagonal matrix where D[i,i] — ^2 i A[i^j] 
and D[i,j] = V % ^ j. 

The model describes evolution of the extrinsic properties 
of a population of nodes (or agents). After some time, the 
network reaches a steady state, and interactions no longer 
change the property of any node, i.e., Qi(t) = 9i(t + 1). In 
the opinion formation example, it would mean that after 
some period, individual opinions no longer change. For uji = 
ujj, in the steady state Qi{t) — 0j(t), In other 

words, the extrinsic properties of all the nodes are the same 
in the steady state. In the context of oscillators, this means 
that their phases are equal and they are synchronized. 

To see why the linearized Kuramoto model is conservative, 
we imagine that interactions result in node exchanging some 
content with neighbors. Imagine that at time t, node i has 
an amount 9i(t) of content and produces some amount uji 
for itself and some amount di0i(t) for its neighbors, which 
it then transfers to its d% neighbors (transmission is denoted 
by negative sign in Eq.[2]). Each neighbor receives 1/di of the 
transmitted amount (reception is denoted by positive sign in 
Eq.[5]). Thus, whatever is produced is completely transferred 
to other nodes in the system. 

The Kuramoto model is just one of a family of conservative 
interaction models. The model would change based on the 
nature of interactions. In the case when the new amount of 
content produced by node i at each time step is 0i (instead of 
diOi in Eq.|3]), the conservative interaction model becomes: 



dt 



K (I — AD~ 1 )0 



(4) 



Here, D~ x is the inverse of the diagonal matrix. Another 
conservative interaction model could be framed using the 
normalized Laplacian operator: 



dt 



K (I — D 



-1/2 



AD 



-1/2 



)0 



(5) 



The normalized Laplacian operators in Eq. [4] and [5] is of- 
ten used to describe random walk-based processes. Eq. [3] 
has been used to describe a variety of conservative systems. 
When uj — and K[i,j] — c, it measures electric po- 

tential in a network of capacitors of unit capacitances, with 
one plate of each capacitor grounded and the other plate 
connected according to the graph structure, with each edge 
corresponding to a resistor of resistance \ . The same equa- 
tion (with uj — 0, K[i, j] — c) has been used to model (dis- 
crete) diffusion of heat and fluid flow in networks and serves 
as the basis of diffusion kernels over discrete structures in 
machine learning algorithms |9j. 

Non-conservative Interaction Models 

In contrast to conservative interaction models, in most 
human or biological networks what is produced is not nec- 
essarily completely transferred or distributed among neigh- 



bors. Some portion of it might be dissipated or lost. This 
changes the nature of interactions and the resulting dynam- 
ics of the network. We present a model of non- conservative 
interactions in undirected networks: 



dt 



dt 



= UJi + 



j ^neigh(i) 



K-(aI- A)6 



(6) 



(7) 



Here a is a constant and / is the identity matrix. The equa- 
tion above introduces a new operator, which we call the 
Replicator operator R = al — A. In order for this system to 
reach a steady state, a > A macc where X m ax is the largest 
eigenvalue of the adjacency matrix of the network. Again 
without loss of generality we can take Kij = c. Eq. [7] gives 
the vector form of the non-conservative model. 

As in the conservative interaction model, we can imagine 
that at each time step, node i produces some amount uji of 
content for itself. In addition, it produces a0i of additional 
content and transmits it to the system regardless of the ac- 
tual number of neighbors it has (transmission is denoted by 
negative sign in Eq.pl. Each neighbor receives an amount 
6i from the system (reception is denoted by positive sign 
in Eq. [6}. Thus (a — di)6i of the new content created by 
node i is not transferred to any neighbor and is lost. This 
accounts for non-conservation during interactions. In spite 
of non- conservation, the system reaches a steady state where 
phases of oscillators no longer change: 0i(t) = Qi(t + 1). In 
steady state, 6i is proportional to the i th element of the 
largest eigenvector of the adjacency matrix. 

Other flavors of the non-conservative interaction model 
are also possible. If the amount produced by node i at each 
time step is 9t[i] (instead of aOi in Eq. [7]), another non- 
conservative linear interaction model could be: 



dt 



= uj-K-(I -a~ 1 A)e 



(8) 



If a > Xmax, then this system would reach equilibrium. 



Spectral Properties of Operators 

As we saw above, the linear conservative model naturally 
gives rise to the Laplacian operator L (see Eq. [3]). This ex- 
plains the connection between the spectrum of the Lapla- 
cian and topological properties of synchronized structures 
that emerge as the network evolves to the fully synchronized 
state. The number of null eigenvalues of L gives the number 
of disconnected components of the graph and is the basis 
of spectral clustering. The time to reach the steady state is 
inversely proportional to the smallest positive eigenvalue of 
the Laplacian, and the gaps between consecutive eigenvalues 
are related to the relative difference in synchronization time 
scales of different modules [5J [l] . 

The replicator operator R we introduced in Eq. [7] is the 
non-conservative counterpart of the Laplacian. Its spectrum 
gives us information about topological and temporal scales 
of non-conservative dynamical systems. In particular, the 
time it takes for the system to reach the steady state is 
inversely proportional to the smallest positive eigenvalue of 
R (when a = \ ma x) as shown in the next section. 

Generalized Interaction Model 

Both conservative and non- conservative interaction mod- 
els are special cases of the general linearized interaction 



model. As we show below, this model generalizes several 
community detection methods, such as spectral clustering, 
modularity maximization and conductance minimization. 

A generalized linear model of interaction can be written 
in terms of the operator C(A) of the adjacency matrix A. 



comprise of nodes ui, U2 • • • Ui, then at t = £2 : e 



-c\it 2 



0, 



dt 



K-C{A)0 



(9) 



Solving this differential equation we get: 
0{t) = (0 -(K ■ C{A)) 

0), and uj the vector of 



1 u)e~ K - c{A)t + {K ■ CiA))- 1 ^ (10) 



Eq. 



10 



with 



with 60 the initial value of 6{t 
natural frequencies. 

Let |V| be the number of nodes in the network. Let X be 
a I V| x I V| matrix whose column X[., i] gives the eigenvector 
of C(A) corresponding to eigenvalue A;. Also, let A be the 
diagonal eigenvalue matrix where A[i, i] = Xi. Let y — X~ x . 
Therefore C(A) = £ 4e {i,2,... m > X[., i]\ t y[i, 
co = 0, K[i,j] = c can be rewritten as: 

0(t) = 9 e- cC(A)t 

= E *[■ 

*G{1,2,...|V|} 

= E *[■ 

iG{l,2,".|V|} 

Here a = y[i, .]0q is a constant. Let Ai < A2 < • • • An 



-cXit 



y[i, .]o 



•1 — cA ? t 

,i\e a 



(11) 



Let U be such that e cA ^ 



0, Mi > j and tj+i be such 



that e"^*^ 1 0, Mi > j + 1. Therefore, for t 3+1 <t<t 3 

e t = U =1 x[.,{\e- c ^ Ci . 

Steady State. 

Let us look at the Ai = case more closely. This arises in 
non-conservative interaction when a — X m ax (Eqs.[7|andj8]). 
In this case as t — » 00, Eq. |11| reduces to Ot^oo = ljci, 
where a is a constant which is the steady state or equilib- 
rium. For non- conservative interaction models, Ai > leads 
to a trivial equilibrium condition. Considering Eq. [3] [7] [4] 
and [8] with uj = 0: 

C = D — A — L: In this case 1] = I ( vector of Is). 
Hence 0t^oo[i] = 0t^oo[j] V Hence the content 
or phase of all nodes is equal at synchronization. 



C — I — AD' 1 : : Hence 6 t ^oo 
gree of node i. 



[i] oc d[i] where d[i] is the de- 



C = I 



-A or C — XmaxI — A — R: Here 1 



oc the 



eigenvector of the adjacency matrix A corresponding 
to the largest eigenvalue. 

C = I - \A or C = otl - A Ma > Xmax : Here t ^oo[i] 
Mi 



Spectral Clustering and Partitioning . 

Note that if Af [., i], Mi G {1, • • • ,j} is used for clustering, 
the conservative interaction models in Eq.|3]|4]and|5] reduce 
to spectral clustering techniques using Laplacian (Eq. [3]) or 
normalized Laplacians (Eq. 4 or 5 ) to find j communities 
[2l] . [l7] showed that naive spectral bisection methods do 
not necessarily work. However, for a conservative dynamic 
process with C — D — A (Eq. I3] with cj = 0), if vertices are 
arranged such that fltfyi] ^ ^tpi] > • • -^t[^|v|] and set Si 



Mi > 2, min^ ^^s^) (Fiedler cut) is 0(1/ y/n) for bounded 
degree planar graphs and a bisector of 0(\/n) can be found 
by repeatedly finding Fiedler cuts. (This is one of the very 
few theoretical guarantees for spectral partitioning.) 

Conductance. 

Finding a partition with a low conductance is closely re- 
lated to the conservative interaction model with C — I — 
AD -1 (Eq^ with uj = ). Let S C V be a set of vertices. 
Let E(S, Sjbe the cut size or the edges going from S to S. 
Volume vol(S) is the sum of the degree of all vertices in S. 
Conductance is given by <j)(G) — mins f^fy • The classic 

Cheeger inequality states that 2<j)(G) > A2 > ^ip ■ There- 
fore, if this conservative dynamic process starts at node u, 
i.e., 0o[u] = 1 for u G V and o [v] = Mv / u G V, 

then \0 t [v] - 0oo[v]\ < e- tM ^^/|^| where d[u] is the de- 
gree of node u. In other words, if conductance is large, this 
dynamic process would reach equilibrium quickly. Let the 
nodes be arranged such that > > • • • > ^^n 1 

o d[ui\ — d[u 2 \ — — d[u\ v \\ 

at time t, and let set Si comprise of nodes u±,U2 -—Ui. 
In this setup, for a set with volume vol(S) < vo1 ^ and 
4>(G) < 7, where 7 is a constant, there is a subset S f C S 
with volume vol(S') > vol(S)_/2, such that, if the conser- 
vative dynamic process (Eq. ffl ) starts at u E S", at t — 

: 1 



mm,s i foil's*) — l\/l°9( vo KS)- This shows that by 
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focussing on cuts determined by linear ordering of vertices 
using 0t of conservative interaction model in Eq. [4] the par- 
tition obtained is quadratic factor of the minimum conduc- 
tance (which is one of the best approximation guarantees for 
local partitioning using conductance) [3]. 

Modularity Maximization. 

If C(A) = DD — A where DD[i,j] = where d[i] is 

the degree of node i and d[j] is the degree of node j and 2m 
are the total number of edges, and if X[.,i],Mi is used for 
clustering, then the model reduces to modularity maximiza- 
tion problem using the eigenvector approach [12] . 

In the section below we use interaction models to identify 
community structure that emerges en route to the steady 
state in real- world networks. We find that conservative and 
non- conservative interaction models lead to similar multi- 
scale organization of the network, but the composition of 
communities found at different scales is markedly different. 

INTERACTION DYNAMICS AND COMMU- 
NITY STRUCTURE 

A community is a group of nodes who are more similar 
to each other than to other nodes. Some network commu- 
nity detection approaches like conductance measure similar- 
ity by the number (or fraction) of edges linking nodes to 
other nodes within the same community p\ . The interaction 
models allow us to define communities dynamically. Given 
a network of nodes with random initial states (0*(t = 0)), 
we allow the system to evolve according to the rules of the 
interaction model. As Arenas et al. 2 observed, as nodes 
interact, their phases (or extrinsic properties) become more 
similar, with nodes within the same community becoming 
more similar to each other faster than nodes from different 



communities. This happens in stages that reveal the net- 
work's hierarchical community structure. In this section we 
define a new similarity function and describe a hierarchi- 
cal clustering algorithm that uses it to identify a network's 
community structure. 

Similarity Measure. 

We assume that when nodes are similar, further inter- 
actions between them do not change their extrinsic prop- 
erty, which is given by the dynamic variable 0i(t). Maximal 
similarity is reached at time t e<? , when the equilibrium or 
steady state is reached. In the conservative model in Eq. [3] 
the steady state corresponds to global synchronization, in 
which every node has the same phase at any time if the 
natural frequencies of all nodes are equal, i.e., uji = cj, Vz. 
The steady state of the non- conservative model is given by 
the largest eigenvector of R (or the adjacency matrix A) 
when uji — 0,Vi. For the sake of convention, we call this 
state the synchronized state, even if the values of all 0iS are 
not the same (but they do have fixed values, given by the 
first eigenvector). Once the system reaches synchronization, 
0i(t + 1) = Oi(t) for all subsequent times. 

Arenas et al. used cosine of the phase difference between 
nodes as the measure of similarity. However, such a measure 
will lead to finite differences between nodes in the steady 
state in the non- conservative model. Instead, we measure 
similarity by the relative difference of the variables in the 
synchronized state. In other words, similarity between nodes 
i and j at time t is 

eq 

sim(i,j,t) = cos(0i(t) - ~^0j(t)) 

where eq , is the value of the dynamic variable in the steady 
state. Therefore for both the conservative and non-conservative 
interaction models, sim(i,j,t) = 1, V i,j G V at t > t eq . 

In the conservative case, the similarity measure we pro- 
pose reduces to the one used by Arenas et el., because in the 
conservative steady state eq — eq \ therefore, sim(i,j,t) — 
cos(0i(t)-0j(t)). 

Hierarchical Community Detection. 

We simulate the interaction model by letting the network 
evolve from some initial configuration. At any time t < t eq , 
we can find the structure of the evolving network by exe- 
cuting a clustering algorithm, e.g., average link hierarchical 
agglomerative algorithm, with the similarity calculated as 
shown above. 

The hierarchical structure of the network can be captured 
by a dendrogram. However, a complete dendrogram may 
be difficult to visualize, especially for large networks. In- 
stead, we use a coarse- graining strategy to cluster nodes if 
their similarity is above some threshold /i. Algorithm ^ de- 
scribes the clustering procedure that takes similarity thresh- 
old ii as input, and at time t finds all communities in the 
network, such that if i G d, max j e c \ (sim(i, j,t)) is more 
than or equal to 1 — fi. Since by construction, in Algo- 
rithm [I] for every i G G, there exists a j G Ci , 1 — \i < 
sim(i,j,t) < maxj^Ci (sim(i,j, £)), therefore in all commu- 
nities output by this algorithm, for all nodes i G Ci, sim- 
ilarity max.jeCi(sim(i,j,t)) > (1 — ji). This algorithm has 
linear runtime, 0(\E\), where \E\ is the number of edges. 
By changing /x, we can change the number and size of clus- 
ters. As \i increases, a cluster fragments into sub-clusters 



and thus a hierarchical arrangement of the clusters can be 
found. The set of communities output by Algorithm [I] at 
time t, for a given \i is unique and independent of the order 
in which edges e(i,j) G E are considered (proof omitted due 
to space constraints). 



Algorithm 1 Communities at time t with threshold of sim- 
ilarity, fi 
Input 

K: number of simulations of the interaction model X 

t: time at which the hierarchy of the evolving communities 

is calculated 

0i(t)[k]: 0i(t) from the k th simulation 

^{t) = (6 i {t)[i]Mt)[%--- Mt)[K\)- 

\i ^similarity threshold 

G(V,E) — network with \V\ nodes, \E\ edges 

e (hj)— edge between i and j 

Output 

Communities {Ci} such that Vz G V 
max j £c i (sim(i, j,t)) > (1 — n) in the interaction 
model X. 
Initialize 
S = E 

Assign each node i to a separate community d G C. 
repeat 

for each e(i,j) G E do 

aim(i,j,t) = i Zy =1 cos(0iit) lv] ~ 

S = S-{e(i,j)} 

if sim(ijjjt) > (1 — fi) then 

Merge d and Cj 
end if 
end for 
until S — (j) 



Fast and scalable. 

The decentralized nature of the interaction models allows 
each node i to compute 0i locally interacting with at most 
d[z] of its neighbors, which helps us to parallelize the compu- 
tation process making it fast and scalable. Due to the linear 
nature of the interaction models considered, Eq. [5] can eas- 
ily be rewritten as ^uT — 00 ~ K ' £(A)0(i)\e Q (i) where 
0o(i)[i] = 0o[i] and is 0o(i)[j] = V j ^ i, 0o being the initial 
starting vector in Eq.|10| Each of the | V| terms of this model 
can be calculated independently increasing parallelizability 
further. 

EMPIRICAL STUDY 

We study the structure of real-world networks including 
Digg and Facebook. We contrast the structure discovered 
by the linearized Kuramoto model, given by Eq. [3] to that 
discovered by the non- conservative interaction model, given 
by Eq. [7] In each simulation, the initial phases of nodes are 
drawn from a uniform random distribution [— 7r,7r] and all 
cjs are set to 0. In the non-conservative interaction model 
we took a = X m ax i-e. interaction models which reach non- 
trivial equilibrium (0 t ^oo[i] oc the eigenvector of the ad- 
jacency matrix A corresponding Xmax)- Investigations into 
the differences of interaction models reaching trivial equilib- 
rium t ^oo[i] = Vz and those reaching non-trivial equilib- 
rium is the scope of future work. We ran multiple simula- 



tions (0(100)) of each interaction model with different initial 
conditions and use these as input to the structure detection 
algorithms described in the previous section. 

Karate Club 

We study the real-world friendship network of Zachary's 
karate club [24], shown in Fig. [TJa), a widely studied so- 
cial network benchmark. During the course of the study, a 
disagreement developed between the administrator and the 
club's instructor, resulting in the division of the club into 
two factions, represented by circles and squares which are 
taken as ground truth communities for this data set. 

Figure [ljb) shows the spectra of the Laplacian and the 
Replicator operators. Each spectrum contains the eigenval- 
ues of the operator, ranked in descending order, with the 
largest eigenvalue in the first position. The time taken for 
an interaction model to reach the steady state depends on 
the smallest positive eigenvalue of the operator. Note that 
the smallest positive eigenvalue of R is larger than that 
of L, implying that the non-conservative interaction model 
reaches steady state faster than the conservative interaction 
model. We observe this empirically in Figure [TJc) and (d) , 
which show the synchronization matrices of the network at 
t — 1000 under the two interaction models. Each point in 
the synchronization matrix represents the similarity of pairs 
of nodes, with red squares corresponding to higher similarity 
values and blue to lower. Clearly, nodes are more synchro- 
nized in the no n- conservative model. 
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Figure 2: Evolution of the discovered community 
structure of the karate club network, as measured 
by normalized mutual information, in the conserva- 
tive and non-conservative interaction models. 

We used average link hierarchical clustering algorithm to 
hierarchically cluster network at different times using syn- 
chronization metric as the measure of similarity. Since the 
ground truth communities of this network are known, we 
adopt normalized mutual information MI as the metric for 
evaluating the quality of discovered communities 4 . This 
metric measures the amount of information about actual 
communities that is given by the clusters found by the algo- 
rithm. When MI — 1, the clusters are the actual communi- 
ties in the network; while for MI — 0, they are independent 
of the actual communities. Figure [2] reports MI scores of 
communities discovered at different times by the two interac- 
tion models. The non-conservative model identifies commu- 
nities faster than the conservative model, and the discovered 
communities are purer. Conservative model assigns nodes 10 
and 15 to a different community than one to which they ac- 
tually belong. The non- conservative model also reveals a 
rich structure with a hierarchy of sub-communities. Nodes 



that are deeper within the hierarchy are more tightly con- 
nected, while nodes higher up, such as node 9, 3, 14 and 20, 
are the bridging nodes connected to both communities. 

In both conservative and non-conservative models, com- 
munity membership of nodes does not change much beyond 
t — 3899. However, the similarity of nodes increases until 
the clustering procedure results in a trivial configuration, 
with every node equally similar to every other node. At this 
stage every node is assigned to the same community. 

Digg Mutual Follower Network 

Digg (http://digg.com) is a social news aggregator with 
over 3 million registered users. Users submit links to news 
stories and recommend them to other users by voting on, or 
digging, them. Of the tens of thousands of daily submissions, 
Digg picks about a hundred to feature on its popular front 
page. Digg also allows users to follow other users to see the 
new stories they have recently submitted or voted for. We 
extracted data about all users who voted on stories that 
have been promoted to Digg's front page in June 2009, which 
includes users followed by these voters[jFrom this data, we 
reconstructed undirected mutual follower network, in which 
an edge between A and B means that user A follows user B 
and B follows A. 

This data set comprises of around 40K nodes and more 
than 360K edges. There are 4,811 disconnected components, 
with the largest component comprising of 70% of the nodes 
(27K nodes) and 96% of the edges (352K edges). The second 
largest component has 22 nodes. Since the inherent richness 
of structure of this network is largely captured by the giant 
component, we study this component in detail. 

Using the Jacobi-Davidson Algorithm for calculating eigen- 
values of a graph, we compute more than 6K of the smallest 
eigenvalues of the Replicator and Laplacian operators and 
rank them in descending order (Fig. |3ja)). The two spectra 
a dramatically different. The smallest positive eigenvalue of 
L is much smaller than that of R. This indicates that the 
non- conservative interaction model reaches the steady state 
much faster than the conservative model. 

Multi-scale Structure of Digg 

We use Algorithm [I] to cluster nodes at different resolu- 
tions specified by the similarity threshold ji. While the over- 
all structure changes over time, we find an intricate multi- 
scale organization of the network in both interaction models. 
At every resolution, we find a 'core and whiskers' organiza- 
tion 111], with one giant community (core) and many small 
communities (whiskers). The core itself has a well-defined 
structure: as we tighten the similarity threshold /x, the core 
fragments into another large core and many small commu- 
nities with a long-tailed size distribution. This process con- 
tinues until the core fragments into some number of small 
communities. 

The community structure of Digg, therefore, resembles an 
onion, with multiple layers of whiskers. This paradigm is 
captured in Figure [4^a), which shows core sizes at different 
resolution scales at time t — 100. At later times, at any 
given resolution the core grows until t — t eq , when it forms 
a giant component for every resolution scale. However, the 
composition of the core remains almost time invariant, i.e. 
the core at a coarser resolution at time t\ is very similar to 
a core at some finer resolution at later time t<z . We chose the 
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Figure 1: Analysis of the karate club network, (a) Friendship graph, (b) Comparison of eigenvalues of the 
Laplacian and Replicator operators. Synchronization matrix at time t — 1000 due to (c) the conservative 
interaction model and (d) the non-conservative interaction model. The color of each square indicates how 
similar two nodes are (zoom in to see node labels), with red corresponding to more similar nodes and blue 
to less similar nodes. 



threshold parameters jjl that give comparable size cores at 
each resolution scale for the interaction models. 

Using the non- conservative interaction model, all thresh- 
olds above \i — 0.0004 produce a single component with 
about 27K nodes. At a finer resolution (smaller /x), the num- 
ber of communities increases. As illustrated in Figure [4^ a), 
at \i — 0.00018, 76% of these nodes form a giant component 
or the core. In addition, there are several small communi- 
ties, whose sizes have a long-tailed distribution (Fig. [3jb)) . 
At \i — 0.00016, the core again divides into one large com- 
munity, with 72% of the nodes, and many small commu- 
nities, whose sizes also have a long-tailed distribution, as 
shown in Figure [3fb) . Increasing the resolution scale fur- 
ther to fi = 0.00014, we discover that the core found at 
\i — 0.00016 breaks down once more into one giant compo- 
nent comprising of 62% of the nodes, and so on. A similar 
organization is discovered using the conservative model and 
though at later times. 

While the onion-like organization discovered by both in- 
teraction models is similar, its composition is different. Fig- 
ure[4ja) shows the overlap of the membership of comparable- 
size cores found by the two models. For example, the size of 
the giant component discovered by non-conservative inter- 
action model for fi — 0.00018 is comparable to the size of 
the core discovered by the conservative interaction model for 
H — 0.2; however, they share only about 80% of the nodes. 
Core overlap decreases to about 40% at \i — 0.00014 for non- 
conservative interaction model (/i = 0.008 for conservative 
model), and keeps on decreasing as we fine-tune the reso- 
lution scale. Finally, the largest component at \i — 0.00008 
for non-conservative and \i — 0.0001 for conservative models 
(resolution scale 1) do not have any nodes in common. 

Empirical Evaluation 

While the two interaction models discover different struc- 
tures in the Digg network, in the absence of ground truth 
communities for this network, it is challenging to say which 
model is correct. However, user activity provides an indepen- 
dent source of evidence for evaluating the quality of com- 
munities. We use this evidence to gain more insight into 
the structure of the Digg network, and show that the non- 
conservative model is better suited for studying it. 



We propose an empirical measure of community quality 
based on user activity. Members of the same community 
are likely to share the same information, interests, and at- 
tributes [7]. As a consequence, they are likely to behave in 
a similar manner, which on Digg translates into voting for 
the same news stories. We measure similarity of two Digg 
users by the number of stories for which they both voted, 
i.e., co-votes. Then, averaging over co- votes of all pairs of 
community members, we obtain a number that quantifies 
the quality of the community. We focus on small compo- 
nents (whiskers) of at least size three isolated from the core 
at different resolutions. Non-conservative interaction model 
assigned 3,712 users to such small communities. In contrast, 
the conservative interaction model assigned just 449 users 
to small communities. The rest of the users fragmented into 
isolated pairs or singletons. 

Figure [5ja) shows the number of small communities re- 
solved by the two interaction models at different scales. Fig- 
ure [5jb) reports the average community quality at each res- 
olution scale, as measured by the number co- votes between 
pairs of community members. Community quality increases 
at finer resolution scales, producing tighter communities in 
the center of the 'onion' as expected. Members of the in- 
nermost communities (resolution scale 1), are much more 
similar than members of the outer communities (resolution 
scales 5, 6). Except for these innermost communities, the av- 
erage quality of communities found by the non- conservative 
model is better than that found by the conservative model. 
The difference at resolution scale 1 is driven by the two out- 
liers in the conservative model. The first of these is a com- 
munity of 26 users, with more than 300 co- votes on average, 
and the other is a community of nine with more than 600 
co- votes. In addition to co- voting on an extraordinary num- 
ber of stories (600 is nearly 20% of all stories in our data 
set), these users are also highly interlinked. The first group 
forms a 13-core (a cluster in which each node is linked to 
at least 13 other nodes), and the second group forms a 4- 
core. These users also share many friends. While we cannot 
say whether these groups represent the often-rumored voting 
blocs on Digg, their activity does appear to be anomalous. 
One way such activity could arise is if each member of the 
group navigated to the profiles of other group members and 
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Figure 3: (a) Top 6000 eigenvalues of the Replica- 
tor and Laplacian operators of the Digg friendship 
network, (b) The long-tailed distribution (using log- 
arithmic binning) of the components comprising the 
core for different similarity thresholds \i for the non- 
conservative model. 



voted for the stories that appeared on that profile, e.g., the 
stories that member submitted or voted for. Such brows- 
ing can be represented by one-to-one interactions; therefore, 
conservative model is best at finding it. Non-conservative 
models describe information diffusion through broadcasts 
of recent votes to followers, and finds communities arising 
from this information sharing behavior. To summarize, non- 
conservative model finds many more small communities of 
higher quality than the conservative model, though the lat- 
ter seems to pick out some anomalous groups of users. 

Facebook 

We performed our analysis on a data set containing a 
snapshot of the Facebook networks as of September 2005 |19[ 
[20] . Each user in this data set has four descriptive features: 
status (e.g., student, faculty, staff, and so on), major, dorm 
or house, and graduation year. We use these features to em- 
pirically evaluate the quality of the discovered communities, 
in a sense that a good community should consist of individ- 
ual who are similar according to these features. While this 
data set contains more than 100 colleges and universities, 
we present here the analysis of the network for American 
University, which comprises of 6,386 nodes and more than 
200K edges. 

Multi-scale Structure of Facebook 

We use Algorithm [I] to cluster nodes at different resolu- 
tion scales specified by the similarity threshold \i. As with 
Digg, we find an onion-like, multi-scale organization in the 
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Figure 4: Number of nodes nodes comprising the 
core at different levels of hierarchy (resolution 
scales) found by the interaction models in the Digg 
and Facebook networks. The resolution scales cor- 
respond to similarity thresholds that give cores of 
comparable size. The green line shows the number 
of nodes that the cores have in common at that res- 
olution scale. 
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Figure 5: Evaluation of communities found in the 
Digg mutual follower graph at t — 100 by the two 
interaction models, (a) Number of small communi- 
ties found at different resolutions specified by the 
similarity threshold parameter. The smallest reso- 
lution corresponds to smallest value of the similar- 
ity threshold, (b) Average quality of communities at 
each scale, as measured by the number of co-votes. 



structures discovered by conservative and non- conservative 
interaction models for the Facebook networks underpinning 
the generality of the observed structure. At each resolu- 
tion, we discover a giant community (core) and many small 
communities (whiskers) with a long tailed size distribution 
(Fig. (6^a)). Just as on Digg, there is little overlap in mem- 
bership between cores found by the two interaction models 
at finer resolutions (Fig. [3Jb)). 

As on Digg, many nodes participate in small, clique-like 
communities. However, while 1,320 nodes contribute to the 
formation of such communities in the non- conservative in- 
teraction model, only 32 nodes participate in such commu- 
nities in the conservative interaction model. The remaining 
users are fragmented into isolated pairs or singletons. As 
in the Digg data set, non-conservative model found many 
more communities than the conservative model. Figure [6^b) 
shows the number of communities discovered at each reso- 
lution scale for conservative and non-conservative models. 

Empirical Evaluation 

We measure quality of the community discovered at dif- 
ferent resolution scales using the four features enumerated 
above namely: major, dorm, year and category of individ- 
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Figure 6: Distribution of communities in the Face- 
book network for American University at t — 100. 
(a) Comparison of size distribution of small commu- 
nities found by the two interaction models at the 
coarsest and finest resolution scales, (b) Number 
of small communities at different resolutions. The 
smallest resolution corresponds to highest similar- 
ity between individuals. 



ual. We measure the prevalence of the most popular value 
of some feature among community members. If the com- 
munity is pure, the quality will be high. For example, the 
quality of a community with respect to the dorm feature 
gives the largest fraction of community members that be- 
long to the same dorm. Figure [7] reports quality of com- 
munities found by the two models at different resolution 
scales with respect to those features. We find that, over- 
all, the quality increases as we tighten the similarity thresh- 
old (decrease the resolution scale), irrespective of the fea- 
ture under consideration. However, the characteristics of the 
community structure discovered by conservative and non- 
conservative interaction models vary significantly. At finer 
resolution scales, non-conservative model finds communities 
of individuals who are more likely to have the same major 
and belong to the same dorm. Conservative models, on the 
other hand, are more likely to put into the same community 
individuals who belong to the student category and are in 
the same year. Though the type of interactions may differ 
from college to college, it is reasonable to assume that stu- 
dents who belong to the same year will have more face to 
face (conservative) interactions, while students who have the 
same major or live in a dorm, may meet in study groups, or 
organized events, increasing chances for one-to-many (non- 
conservative) interaction. 

In summary, regardless of the interaction process, we ob- 
serve a roughly scale invariant organization in the real- world 
social networks. At almost every resolution scale, we find a 
large component and many small components with a long- 
tailed size distribution. Thus, Digg and Facebook's structure 
resembles an onion. Peeling each layer reveals another, al- 
most self-similar structure with a core and many smaller 
communities. However, the composition of communities de- 
pends on the interaction process, and is different for the 
conservative and non-conservative interaction models. 



RELATED WORK 

Community detection is an extremely active research area, 
with a variety of methods proposed, including spectral clus- 
tering, graph partitioning and modularity maximization pi. 
We show in this paper that these methods can be expressed 
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Figure 7: Evaluation of communities found in the 
Facebook network of American University at t = 100 
by the two interaction models. Average quality of 
communities at each resolution scale, as measured 
by the probability of occurrence of the most fre- 
quent value of features major, dorm, year, category 
of individual. 



in terms of the generalized linear interaction model that as- 
sumes a specific type of interaction. We also demonstrate 
that to get the full picture of network's emergent structure, 
community detection method must account for the dynamic 
process occurring on the network. 

It might be argued that taking interactions into account 
eventually leads to a weighted graph and off-the shelf com- 
munity detection algorithms for weighted graphs 5 might 
be applied. However, like in the unweighted case, application 
of a community detection method on a weighted graph with- 
out taking the nature of interaction process into account, 
might lead to unsatisfactory results. For example, if con- 
ductance minimization algorithm is applied on a weighted 
graph, whose weights are a consequence of a non-conservative 
process, the structure detected might differ significantly from 
ground truth. Our method on the other hand learns the 
weights from the interaction process and then detects struc- 
ture dynamically. Learning the underlying interaction pro- 
cess from the activity logs of nodes of the network and using 
this process to determine the community structure is the 
course of future work. 

Several community detection methods implicitly takes dy- 
namic interactions into account. These include spin models, 
random walk models and synchronization. Spin models [22] 
imply that the interaction is ferromagnetic i.e. it favors spin 
alignment. As we show in this paper, random walk and Ku- 
ramoto synchronization models [lO] are both conservative in 
nature, with the former expressed in terms of the normalized 
Laplacian, and the latter in terms of the graph Laplacian. 
Arenas et al. 2 studied the relationship between topologi- 
cal and community structure of complex networks using the 
Kuramoto model of synchronization. They created a thresh- 



old graph at some point in time where an edge exists be- 
tween nodes only if their similarity exceeds some threshold. 
They defined communities as disconnected components of 
the threshold graph. We, on the other hand, explore differ- 
ent types of interactions and show how these reveal differ- 
ent hierarchical community structures in real- world complex 
networks. We also introduce a pro cess- independent similar- 
ity metric. Hu et al. [§] found communities based on sig- 
naling interactions. They described the interactions by an 
operator C(A) = (I + A) and used K-means clustering and 
F-statistics to find the optimal clusters at a some point of 
time. However, it can be shown mathematically that the 
process they defined will never reach a steady state. Our 
non- conservative interaction model treats signaling interac- 
tions in a principled way. 

Community detection methods are used to reveal the struc- 
ture of complex networks. Leskovec et al. [TT] found 'core and 
whiskers' structure of real- world networks using conductance- 
based methods and argued that these methods cannot reveal 
any further structure in the giant core. Song [l6] claimed 
that there exist self-repeating patterns in complex networks 
at all length scales. Our results corroborate this claim, as 
we show a repeating 'core and whiskers' pattern in the Digg 
social network at many different length scales. 

It can be shown that some of the interaction models de- 
scribed above not only solve certain regularized Semi-Definite 
Programs but also give fast solutions to these problems [13] . 

CONCLUSION 

Our work highlights the importance of dynamic interac- 
tions in the analysis of network structure and provides a 
framework for unifying some of the existing community de- 
tection methods. We argue that in order to understand net- 
work structure, not only its topology but also the nature of 
interactions between nodes should be taken into considera- 
tion. We have proposed a novel non- conservative interaction 
model inspired by distributed synchronization of a network 
of coupled oscillators. We also presented a new formulation 
of similarity which we used in multi-scale analysis of net- 
work structure and an activity-based metric to measure the 
quality of communities in a real-world network. Our decen- 
tralized approach to the community detection is fast and 
scalable. 

Our study of the community structure of real-world so- 
cial networks revealed a complex 'onion'-like organization. 
Peeling each level of hierarchy gives a core and many small 
components, regardless of the interaction model. However, 
different interactions lead to different views of this multi- 
scale organization. 

In future, we would like to investigate the effect of non- 
zero uj on the interaction models. Also, we would like to 
investigate the ergodicity of interaction models and the spec- 
tral properties of the different operators. Our work offers a 
framework for understanding the role of dynamic processes 
in the measurement of network structure. We hope that our 
investigations inspire others to explore the relationships be- 
tween network structure, topology and dynamics. 
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